Multi-Level Cross-Modal Semantic Alignment Network for Video–Text Retrieval
نویسندگان
چکیده
This paper strives to improve the performance of video–text retrieval. To date, many algorithms have been proposed facilitate similarity measure retrieval from single global semantic multi-level semantics. However, these methods may suffer following limitations: (1) largely ignore relationship which results in levels are insufficient; (2) it is incomplete constrain real-valued features different modalities be same space only through feature distance measurement; (3) fail handle problem that distributions attribute labels heavily imbalanced. overcome above limitations, this proposes a novel cross-modal alignment network (MCSAN) for by jointly modeling on global, entity, action and unified deep model. Specifically, both video text first decomposed into carefully designing spatial–temporal learning structures. Then, we utilize KLDivLoss parameter-share projection layer as statistical constraints ensure representations projected common space. In addition, focal binary cross-entropy (FBCE) loss function presented, effort model unbalanced distribution MCSAN practically effective take advantage complementary information among four levels. Extensive experiments two challenging datasets, namely, MSR-VTT VATEX, show viability our method.
منابع مشابه
Multi modal multi-semantic image retrieval
.................................................................................................................... viii ACKNOLWEDGEMENTS .................................................................................................... x ABBREVIATIONS ................................................................................................................. xi CHAPTER 1 INTRODUCTION ...
متن کاملMHTN: Modal-adversarial Hybrid Transfer Network for Cross-modal Retrieval
Cross-modal retrieval has drawn wide interest for retrieval across different modalities of data (such as text, image, video, audio and 3D model). However, existing methods based on deep neural network (DNN) often face the challenge of insufficient cross-modal training data, which limits the training effectiveness and easily leads to overfitting. Transfer learning is usually adopted for relievin...
متن کاملLearning Deep Semantic Embeddings for Cross-Modal Retrieval
Deep learning methods have been actively researched for cross-modal retrieval, with the softmax cross-entropy loss commonly applied for supervised learning. However, the softmax cross-entropy loss is known to result in large intra-class variances, which is not not very suited for cross-modal matching. In this paper, a deep architecture called Deep Semantic Embedding (DSE) is proposed, which is ...
متن کاملCorrelation Hashing Network for Efficient Cross-Modal Retrieval
Due to the storage and retrieval efficiency, hashing has been widely deployed to approximate nearest neighbor search for large-scale multimedia retrieval. Cross-modal hashing, which improves the quality of hash coding by exploiting the semantic correlation across different modalities, has received increasing attention recently. For most existing cross-modal hashing methods, an object is first r...
متن کاملSemiautomatic Image Retrieval Using the High Level Semantic Labels
Content-based image retrieval and text-based image retrieval are two fundamental approaches in the field of image retrieval. The challenges related to each of these approaches, guide the researchers to use combining approaches and semi-automatic retrieval using the user interaction in the retrieval cycle. Hence, in this paper, an image retrieval system is introduced that provided two kind of qu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Mathematics
سال: 2022
ISSN: ['2227-7390']
DOI: https://doi.org/10.3390/math10183346